Complexity Measures of Supervised Classification Problems

نویسندگان

  • Tin Kam Ho
  • Mitra Basu
چکیده

ÐWe studied a number of measures that characterize the difficulty of a classification problem, focusing on the geometrical complexity of the class boundary. We compared a set of real-world problems to random labelings of points and found that real problems contain structures in this measurement space that are significantly different from the random sets. Distributions of problems in this space show that there exist at least two independent factors affecting a problem's difficulty. We suggest using this space to describe a classifier's domain of competence. This can guide static and dynamic selection of classifiers for specific problems as well as subproblems formed by confinement, projection, and transformations of the feature vectors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

When will Feature Feedback help? Quantifying the Complexity of Classification Problems

Supervised learning typically requires human effort to label a large number of training instances. Active learning strives to decrease the number of labeled training examples needed by actively engaging the learner and the human in an interactive process. Active learning has proven to be effective in many domains. With few training examples, past work has found that user prior knowledge on the ...

متن کامل

Dataset Complexity and Gene Expression Based Cancer Classification

When applied to supervised classification problems, dataset complexity determines how difficult a given dataset to classify. Since complexity is a nontrivial issue, it is typically defined by a number of measures. In this paper, we explore complexity of three gene expression datasets used for two-class cancer classification. We demonstrate that estimating the dataset complexity before performin...

متن کامل

Partial Information and Distribution-Dependence in Supervised Learning Models

In this thesis we study two important supervised learning settings: linear classifiers with a margin, and Multiple-Instance Learning, and provide novel results concerning the ability to learn in each of these settings. In supervised learning, the goal is to learn to classify objects into one of several classes, using only examples of objects, along with the class that they belong to (also terme...

متن کامل

Determining the accuracy in image supervised classification problems

A large number of accuracy measures for crisp supervised classification have been developed in supervised image classification literature. Overall accuracy, Kappa index, Kappa location, Kappa histo and user accuracy are some well-known examples. In this work, we will extend and analyze some of these measures in a fuzzy framework to be able to measure the goodness of a given classifier in a supe...

متن کامل

Domains of competence of the semi-naive Bayesian network classifiers

The motivation for this paper comes from observing the recent tendency to assert that rather than a unique and globally superior classifier, there exist local winners. Hence, the proposal of new classifiers can be seen as an attempt to cover new areas of the complexity space of datasets, or even to compete with those previously assigned to others. Several complexity measures for supervised clas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2002